pyDuCo

Abstract

pyDuCo is a metadata manager for the usage in intermittent bespoke scripts in a scientific context. It follows the Dublin Core™ metadata specification

pyDuCo is Open Source and the repository is available at GitLab.

Measures:

A-2-2, A-2-3, A-4-2, A-4-3

Description

Bespoke automisation to process research data is an everyday activity in scientific contexts. Python is popular as suitable programming language, allowing for efficient processing of increasing amounts of data. With increasing sizes and complexities of the datasets, metadata management becomes more crucial for researchers to keep track of their research data and its inherent links/relationships. This metadata, i.e. sources, references, accrual methods, targets and versions are well known during the implementation of the bespoke automisation scripts or even accessible as python objects during runtime. Still, this data is often lost when the task is completed and the bespoke scripts evolve or even are adapted for processing the next dataset.

pyDuCo is a metadata manager python package for the usage in intermittent and ever-evolving data processing scripts. It provides functionality to easily store and interconnect relevant metadata and add it to research datasets, following the notion of a “metadata pipeline”. It is conform to the Dublin Core™ metadata specification to assist researchers to improve standardization, interoperability and machine-searchability.

Status

Planned Activities

  • Extension of the adapters in pyDuCo to automatically extract more metadata from more datatypes
  • Creation of an automatic documentation based on the existing Docstrings

In-progress Activities

Feature extensions are in progress:

  • Implementation of extensive examples to showcase pyDuCo’s features and intended usecases
  • Implementation of a CI/CD pipeline, based on the existing testsuite

Completed Activities

  • First implementation as python Package, including a testsuite. (Q3/2023)
  • First inclusion into an internal tool for parameter analysis, adding features and improvements as required. (Q1/2024)
  • Release as public repository. (Q2/2024)

Results

Version v1.0

Acknowledgements

pyDuCo is being developed within the project NFDI4Ing.

Funded by the German Research Foundation (DFG) - project number 442146713.